64c Approximate Dynamic Programming Based Strategy for Markov Decision Problems in Process Control and Scheduling
نویسنده
چکیده
Most interesting problems in process control and scheduling can be formulated as a Markov Decision Problem (MDP). This includes real-time decision problems (e.g., feedback control and information-based rescheduling) that involve significant amounts of stochastic uncertainty. Optimal policy for MDPs can be derived by solving an associated stochastic dynamic programming (DP) problem. However, the computational complexity of stochastic dynamic programming is such that it is not a feasible approach for most practical problems. This is the reason why in practice one resorts to the popular approach of solving deterministic optimal control problem at each time step with feedback update (as in MPC), an approach which can be highly suboptimal. The framework of Approximate Dynamic Programming (ADP) offers some promising venue for pursuing the stochastic DP approach. In ADP, the computation is made feasible by pursuing a solution within a significantly restricted subset of the state space. The quality and computational complexity of the solution strongly depends on the choice of this subset. In ADP, this " working region " of the state space is identified by performing stochastic simulations of the closed-loop system with one or more known suboptimal policies. By solving the dynamic program within the state space defined by these simulations, one finds the best interpolated state trajectories for each encountered situation. This way, one can view the Bellman iteration in dynamic programming as a way to blend the simulated suboptimal policies in an optimal manner. In this presentation, we will describe how we can get around the 'curse-of-dimensionality' associated with the traditional solution approach to stochastic dynamic programming problems. We will then bring forth some key issues in applying the ADP approach to process control and scheduling problems. These include the choice of function approximator for the cost-to-go approximation and the restriction of the solution to the " working region " so that unreasonable extrapolations of the cost-to-go data are avoided. We will also identify some key situations where such an approach could offer a significant advantage over the existing approach. A number of examples drawn from process control and scheduling will be presented to make the case.
منابع مشابه
Stochastic Reactive Production Scheduling by Multi-agent Based Asynchronous Approximate Dynamic Programming
The paper investigates a stochastic production scheduling problem with unrelated parallel machines. A closed-loop scheduling technique is presented that on-line controls the production process. To achieve this, the scheduling problem is reformulated as a special Markov Decision Process. A near-optimal control policy of the resulted MDP is calculated in a homogeneous multi-agent system. Each age...
متن کاملApproximate Dynamic Programming For Sensor Management
This paper studies the problem of dynamic scheduling of multi-mode sensor resources for the problem of classification of multiple unknown objects. Because of the uncertain nature of the object types, the problem is formulated as a partially observed Markov decision problem with a large state space. The paper describes a hierarchical algorithm approach for efficient solution of sensor scheduling...
متن کاملSensor Scheduling for Target Tracking Using Approximate Dynamic Programming
To trade off tracking accuracy and interception risk in a multi-sensor multi-target tracking context, we study the sensor-scheduling problem where we aim to assign sensors to observe targets over time. Our problem is formulated as a partially observable Markov decision process, and this formulation is applied to develop a non-myopic sensor-scheduling scheme. We resort to extended Kalman filteri...
متن کاملExpected Duration of Dynamic Markov PERT Networks
Abstract : In this paper , we apply the stochastic dynamic programming to approximate the mean project completion time in dynamic Markov PERT networks. It is assumed that the activity durations are independent random variables with exponential distributions, but some social and economical problems influence the mean of activity durations. It is also assumed that the social problems evolve in ac...
متن کاملModelling and Decision-making on Deteriorating Production Systems using Stochastic Dynamic Programming Approach
This study aimed at presenting a method for formulating optimal production, repair and replacement policies. The system was based on the production rate of defective parts and machine repairs and then was set up to optimize maintenance activities and related costs. The machine is either repaired or replaced. The machine is changed completely in the replacement process, but the productio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005